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I. The capacity and the entropy bound 

Let H be a Hilbert space providing a quantum-mechanical description for 
the physical carrier of information. A simple model of quantum communi- 
cation channel consists of the input alphabet A = {1, ...,a} and a mapping 
i —>■ Si from the input alphabet to the set of quantum states in H. A quan- 
tum state is a density operator, i. e. positive operator S in Ti. with unit 
trace, TrS' = 1. Sending a letter i results in producing the signal state Si of 
the information carrier. 

Like in the classical case, the input is described by an apriori probability 
distribution it = {vTj} on A. At the receiving end of the channel a quantum 
measurement is performed, which mathematically is described by a resolution 
of identity in 7i, that is by a family X = {Xj} of positive operators in Ti 
satisfying Y^j Xj = I, where I is the unit operator in Ti. \\Holevo 1973^ . The 
probability of the output j conditioned upon the input i by definition is 
equal to P{j\i) = TrSiXj. The classical case is embedded into this picture 
by assuming that all operators in question commute, hence are diagonal in 
some basis labelled by index u; in fact by taking Si = dia.g[S{uj\i)], Xj = 
diag[X(j|u;)], we have a classical channel with transition probabilities S{u\i) 
and the classical decision rule X{j\uj), so that P{j\i) = J^w ^U\^)S{^\i)■ 
We call such channel quasiclassical. 



The Shannon information is given by the usual formula 



/i(7r,X) = ^^vr,P(jK)log[ 

.7 i \ 



(1) 



Denoting by H{S) = — TrSlogS* the von Neumann entropy of a state S, we 
assume that H{Si) < oo. If vr = {tTj} is an apriori distribution on A, we 
denote 

Stt = Y1 '^i^i^ H^{S(^.)) = TTiH{Si) 

ieA ieA 
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and 



AHi7r)=H{S^)-H^iS^.)). 
The famous quantum entropy bound says that 

sup/i(7r,X) < AHin), 

X 



(2) 



with the equahty achieved if and only if all the operators WiSi commute. 
The inequality was explicitly conjectured in \}fJordon 19641^ , 
in 



and discussed 



Forney 19631 , \ Levitin 1969i\ and elsewhere in the context of conven- 
tional quantum measurement theory. The first published proof appeared 



m 



Holevo 19731. 1^ is worthwhile to mention also that this bound is closely 



related to the fundamental property of decrease of quantum relative entropy 
under completely positive maps developed in \\Lindblad 1973-197dli and in 
Uhlniann 1977\\ (see ||Yueii and Ozawa 1993i for history and some general- 



izations of the entropy bound). 

In the same way we can consider the product channel in the tensor product 
Hilbert space 7i®" = Ti ® ... ® Ti with the input alphabet consisting 
of words w = {ii,...,in) of length n, and with the density operator Sw = 
S'ij ®...®S'j„ corresponding to the word w. If vr is an apriori distribution on A"- 
and X is a resolution of identity in we define the information quantity 
In{TT,X) by the formula similar to (1). Denoting C„ = sup^ /„(7r, X), we 
have the property of superadditivity C„ + Cm < Cn+m, hence the following 
limit exists 

C = lim Cn/n, 



which is called the capacity of the initial channel [ Holevo 197£ \ . This defini 



tion is justified by the fact easily deduced from the classical Shannon's coding 
theorem, that C is the least upper bound of the rate (bits/symbol) of infor- 
mation which can be transmitted with asymptotically vanishing error. More 
precisely, we call by code {W, X) of size M a sequence {w^, Xi), {w^ , Xm), 
where w'^ are words of length n, and {Xk} is a family of positive operators 
in H®", satisfying EtUXk < I. Defining Xq = I - EkUXk, we have a 
resolution of identity in Ti^"'. The average error probability for such a code 
is 



M 



X{W,X) 



— y[l-Tr5, 



kX, 



k- 



(3) 
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Let us denote p{M, n) the minimum of this error probabihty with respect 
to all codes of the size M with words of length n. Then p(2"'^'^~^\n) 
and p(n, 2"'('-^+'^^) -/-^ 0, where 5 > 0, as n ^ oo. 

Applied to /„(7r,X) and combined with the additivity and continuity 
properties of AH{tt) the entropy bound (2) implies C < max^^ AH^tc) = C 
Thus 

Ci<C <C. 

For a classical channel C„ = nCi, and all the three quantities coincide. A 
striking feature of quantum case is possibility of the inequality Ci < C im- 
plying strict superadditivity of the information quantities C„ \\Holevo 197!j \. 



In a sense, there is a kind of "quantum memory" in channels, which are the 
analog of classical memoryless channels. This fact is just another manifesta- 
tion of the "quantum nonseparability" , and in a sense is dual to the existence 
of Einstein - Podolsky - Rosen correlations: the latter are due to entangled 
(non-factorizable) states and hold for disentangled measurements while the 
superadditivity is due to entangled measurements and holds for disentangled 
states. 

The inequality C ^ C\ raised the problem of the actual value of the ca- 
pacity C . A possible conjecture was C = (7, but the proof for it came only 
recently, first for the pure state (noiseless) channels in the paper of Hausladen, 



Jozsa, Schumacher, Westmoreland and Wootters jHausJadeii et aJ. J996|| , and 



then for the case of arbitrary signal states in \tiolevo J996| and in 



Schumacher and Westmoreland J997[| . Since the entropy bound (2) and the 



classical weak converse provide the converse of the quantum coding theorem, 
the main problem was the proof of the direct coding theorem, i. e. of the 
inequahty C > C. 



II. The pure state channel 

Following Dirac's formalism, we shall denote vectors of Ti. as {ip >, and 
hermitean conjugate vector of the dual space - a.s < ipl- Then < (plip > is the 
inner product of \(f) >, \ip > and \ip >< 0| is the outer product, i. e. operator 
A of rank 1, acting on vector \x > as A\x >= {ip >< 4>\x > ■ If \ip > is a 
unit vector, then {ip >< ip\ is the orthogonal projection onto \tp >. This is a 
special density operator, representing pure state of the system. Pure states 
are precisely extreme points of the convex set of all states; an arbitrary state 
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can be represented as a mixture of pure states, i. e. by imposing classical 
randomness on pure states. In this sense pure states are "noiseless", i. e. 
they contain no classical source of randomness. 

Let us consider a pure state channel with Si = \ipi>< ipil . Since the en- 
tropy of a pure state is zero, AH{tc) = H{S-„) for such a channel. If a decision 
rule X = {Xj} is applied at the output then P{j\i) =< V'il-^jV'i > • A system 
{\(f)j >} of (unnormalized) vectors is called overcomplete if J2j >< = 
I. Every overcomplete system gives rise to the decision rule Xj = >< (f)j\ 
for which P{j\i) = \ < "ipil^Pj > 

The first step in getting a lower bound for the capacity C has geometric 
nature and amounts to obtaining a tractable upper bound for the average 
error (3) minimized over all decision rules. Sending a word w = {ii, . . . ,in) 
produces the tensor product vector = ipi^®. ■ € ?-^®". Let {W, X) be 

a code of size M. Let us restrict for a while to the subspace of Ti®" generated 
by the vectors ip^i, . . . ^iIj^m , and consider the Gram matrix V{W) = [< 
i'wi\'4'wi >] and the Gram operator G{W) = J^kLil^w*: >< ^w'^l- This 
operator has the matrix r(iy) with respect to the overcomplete system 

1^^. >= G{W)~'/'\tP^. > ; k = l,...,M. (4) 

The resolution of identity of the form 

Xk = \i^k >< (5) 

approximates the quantum maximum likelihood decision rule (which in gen- 
eral cannot be found explicitly); the necessary normalizing factor G{W)~^^'^ 
is a major source of analytical difficulties in the noncommutative case. Note 
that the vectors ip^i, . . . , ip^M need not be linearly independent; in the case 
of linearly independent coherent state vectors (5) is related to the "subop- 
timal receiver" described in Wtielstrom 19781, Sec. VI. 3(e). It was shown in 



Holevo 1978i\ that by using this decision rule one obtains the upper bound 



mm A(X, W)<^Sp{E- 1(^)^2) = 1, Sp - TiWy/'f , (6) 

where E is the unit M x M-matrix and Sp is the trace of M x M-matrix. This 
bound is "tight" in the sense that there is a similar lower bound. However it 
is difficult to use because of the presence of square root of the Gram matrix. 
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A simpler but coarser bound is obtained by using the operator inequahty 
{E-T{Wy/y < iE-T{W)f: 



min MW, X) < - Sp (i? - TiW)f = ^ Tr ^ J] S^rS^s. 



M 



M 



(7) 



As shown in [[/jojevo 1978i\ , this bound is asymptotically equivalent (up to 
the factor 1/4) to the tight bound (6) in the limit of "almost orthogonal" 
states r(W^) — > E. On the other hand, different words are "decoupled" in 
(7) which makes it suitable for application of the random coding. 

Just as in the classical case, we assume that the words w^,...,w'^-' are 
chosen at random, independently and with the probability distribution 

P{ty = = 7ri...7r„. (8) 

Then for each word w the expectation 

ES^ = ST, (9) 

and by taking the expectation of the coarse bound (7) we obtain, due to the 
independence of w^,w'^ 

p{M,n) < Emm\{W,X) < (M - l)Tr(^f")2 = (M - l)2-"'°sTt52_ 
By denoting 

C = — logminTrS*^ = — log min TCiTTj \ < ipili'j > (10) 

id 

we conclude that C > C . There are cases (e. g. pure state binary channel) 
where C* > Ci, so this is sufficient to establish C > Ci, and hence the 
strict superadditivity of C„ ||HoJevo J 97^1 , but not sufficient to prove the 
coding theorem, since C < C unless the channel is quasiclassical. A detailed 
comparison of the quantities Ci, C for different quantum channels was made 
by Ban, Hirota, Kato, Osaki and Suzuki | [Kato et ai. 1996| . The quantity 
C was discussed in [ [HoJevo I979i , \^tratonovich and Vantsjan 1978i\ , but its 
real information theoretic meaning is elucidated only in connection with the 
quantum reliability function (see (15) below). 

The proof of the inequality C > C given in \\Hausladen et al. 199(^ 



achieves the goal by using the approximate maximum likelihood improved 



5 



with projection onto the "typical subspace" of the density operator Sf"' and 
the correspondingly modified coarse bound for the error probability. The 
coarseness of the bound is thus compensated by eliminating "non-typical" 
(and hence far from being orthogonal) components of the signal state vec- 
tors. More precisely, let us fix small positive S, and let Xj be the eigenvalues, 
\ej > the eigenvectors of 3,,. Then the eigenvalues and eigenvectors of S*®" 
are Aj = Xj, ■ ... ■ Xj^, \ej >= \ej^ > O... O |ej„ > where J = (ii,...,j„). 
The spectral projector onto the typical subspace is defined as 

^ = II |ej >< ej|, 
jg-B 

where B = {J : 2-"[^(^-)+'^l < Xj < 2-"[-f^(^-)-'51}. This concept plays a cen- 
tral role in "quantum data compression" ||Jozsa and Schumacher 19941 ■ In ^ 
more mathematical context a similar notion appeared in \\Ohya and Fetz , 
Theorem 1.18. Its application to the present problem relies upon the follow- 
ing two basic properties: first, by definition, 

||^(X>np|| ^ 2~"[-'^{'5ir)-<5] (11) 

Second, for fixed small positive e and large enough n 

Tr5r(/ - ^) < e, (12) 

because a sequence J E B is typical for the probability distribution given by 
eigenvalues Xj in the sense of classical information theory \\Gallager 
[ [Cover and Thomas 1991 



By replacing the signal state vectors \il)wk > with unnormalized vectors 
\ip^k >= Plip^f^ >5 defining the corresponding approximate maximum likeli- 
hood decision rule, and denoting r(Vr) the corresponding Gram matrix, the 
modified upper bound 

mmX{W, X)<^ {Sp {e - f (W^)) + Sp (i? - f{W)f} 

r s=ir 

is obtained in \ Hausladen et al. 199t \. Applying the random coding and 
using (9) and the properties (11), (12) of the typical subspace, one gets for 
large n 

p{M,n) < {e + (M - i)2-'^[His.)-s\j^ 
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resulting in the inequality C > C. 

It is known, however, that in classical information theory the coding 
theorem can be proved without resorting to typical sequences, by mere use 
of clever estimates for the error probability [[GaJJager 1968^ . Moreover, in this 
way one obtains the exponential rate of convergence for the error probability, 
the so called reliability function 



E[R) = lim sup — log 



n 



nR 



0<R<C . 



n] 



This puts us onto the idea of trying to apply the random coding procedure 
directly to the tight bound (6) in the quantum case. This is realized in 
\Burnashev and Holevo 1997{\ . Rather remarkably, the expectation can be 
calculated explicitly. 



— ESp {E-T{Wy/')=Tif{Sr) 



where 



-[1 - + (M - l)z + (M - 1)(1 - VT^)]. 



This function strangely resembles the expression for the Bayes error in the 
"equiangular" case \\Helstrom 1976^ rel. (VI. 2. 10), although does not coincide 
with it. The function f{z) admits standard estimates 

f{z) < zmin {(M- 1)^,2} < 2(M - < s < 1, 



allowing to prove the following result \Burnashev and Holevo 1997 



Theorem 1. For all M, n and < s < 1 

E mmX{W,X) < 2{M - 1] 



TtS. 



l+s 



(14) 



It is natural to introduce the function fi{7i, s) similar to analogous function 
in classical information theory jGallager 1968^ , Sec. 5.6 



/i(7r, s) = -logTrS*^ 



l+s 



Then 



E{R) > max max^ (/^(^i^, s) — sR) = Er{R). 
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On the other hand, it appears possible to apply in the quantum case the "ex- 
purgation" technique from \\GaUager 1969i\ , Sec. 5.7, resulting in the bound 



S>1 

where 



E{R) > maxmax(yu(7r, s) — sR) = Eex{R), 



fi{TT, S) = -S In ^VTiVTA: I < 1pi\lpk > 



i,k 



The behavior of the lower bounds Er{R), Eex{R) can be studied by the meth- 
ods of classical information theory, see \}purnashev and Holevo J997| . In par- 



ticular, it follows easily that C > max^/x'(7r, 0) = C. Thus the rate C — 5 
can be attained with the approximate maximum likelihood decision rule (5), 
(4) without projecting onto the typical subspace. 
We also remark that 

^(7r,l)=/i(7r,l) = -logTr5^, (15) 

and that the common linear portion of the functions Er{R),Eex{R) is just 
fiin,l)-R. 

III. General signal states with finite entropy 

The general case is substantially more complicated already on the level of 
quantum Bayes problem; in particular, so far no upper bound for the aver- 
age error probability is known, generalizing appropriately the geometrically 
simple bound (6). The proof given in \\Holevo 1996^ is based rather on a 



noncommutative generalization of the idea of "jointly typical" sequences in 
classical theory ||Cover and Thomas 19911 . This is realized by substituting 



in the average error probability (3) the decision rule 

M M 



1=1 1=1 



where P^fc is a proper generalization of the typical projection for the density 
operators S^jk . The essential properties of P^i are 

P^fe < 5„fc2"[^-(^('))+'51, (17) 
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ETiS^k^I - P^k) < e, (18) 

After substituting (16) into (3) and performing a number of rather labo- 
rious steps intended to get rid of the normahzation factors in (16) and thus 
to obtain an expression in which the different words are "decoupled", one 
arrives at the estimate 

1 

mmX{W,X) <—J2{3TtSMI-P) + Y,TtPS^,PP^^ + TtS^.{I-P^.)}. 

^ k=i i^k 

(19) 

Taking the expectation and using (9), (12), (18), one obtains 
E min A(iy, X) < 4e + (M - l)||5f "P||Tr EP^, 

for n large enough, hence by (11), (17) 

p(M,n) < 4e + (M - i)2-"I^^W-2'5] 

implying C >C. Combined with the entropy bound, this gives 

Theorem 2. The capacity oi the channel with H{Si) < oo is given by 

C = m^x[H{J2 vr.5,) - J2 ^^H{S{)]. (20) 

For quasiclassical channel where the signal states are given by commut- 
ing density operators Si one can use the classical bound of Theorem 5.6.1 
\]paUager 196dl with transition probabilities S{uj\i), where S{uj\i) are the 
eigenvalues of Si. In terms of the density operators it takes the form 



E min X(W, X) < min (M - 1)" Tr 



l+s\ 



(21) 



The righthand side of (21) is meaningful for arbitrary density operators, 
which gives a hope that this estimate could be generalized to the noncommu- 
tative case (note that for pure states Si Theorem 1 gives twice the expression 
(21)). This would not only give a different proof of Theorem 2, but also a 
lower bound for the quantum reliability function in the case of general signal 
states, eventually with infinite entropy. 
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IV. Quantum channels with constrained inputs 



In classical information theory direct coding theorems for channels with 
additive constraints are proved by using random coding with probability 
distribution (8) modified with a factor concentrated on words, for which the 
constraint holds close to the equality \\GaUager 1^6'^, Sec. 7.3. The same 



tool can be applied to quantum channels [[fjoievo J997|| . For definiteness in 
this section we take for the input alphabet A an arbitrary Borel subset in a 
finite-dimensional Euclidean space S. We assume that the channel is given 
by weakly continuous mapping x ^ Sx from the input alphabet A to the set 
of density operators in H. We assume that a continuous function / on is 
fixed and consider the set Vi of probability measures vr on A satisfying 

/ f{x)7r{dx) < E. (22) 

J A 

For arbitrary vr G Pi consider the quantity 

AH{tt) = H{S^) - / H{Sx)7Tidx), (23) 

J A 

where = J^SxTridx). Assuming the condition 

sup H{S^) < oo, (24) 

we denote 

C = sup AH{7r). (25) 

TTG-Pl 

Let p{M, n) denote the infimum of the average error probability over all 
codes of size M with words w = (xi, . . . , Xn) satisfying the additive constraint 

/(xi) + . . . + /(x„) < riE. (26) 

Theorem 3. Under the condition (24) the capacity of the channel with 
the input constraint (26) is given by (25), i. e. p{e''^^-^\n) ^ , and 
p(e"'('^+'^\ n) 7^ for 5 > as n oo. 

The proof uses the inequality (19) with the random coding modified as 
described in ||GaJJager Sec. 7.3. The same method combined with the 

estimate (14) for pure state channels gives lower bound for the reliability 
function modified with the factor const ■e''t-^(^)~'^l with r > 0. 
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Theorem 3, when apphed to quantum memoryless Gaussian channels with 
the energy constraint \\Holevo 1997 ], allows us to prove for the first time their 
asymptotic equivalence, in the sense of the information capacity, to the corre- 
sponding quasiclassical "photon channels", extensively studied from the ori- 
gin of quantum communications \^ordon J964|| , [[Lebedev and Levitin 196^ , 



[[Caves and Drummond 1 99-j|| . It is plausible that the equivalence extends 
also to waveform channels, in particular , that the infinite-band photon chan- 
nel capacity [[Lebedev and Levitin 196B^ 



C = Tl\ 



IN + E N 



is equal to the properly defined capacity of the quantum Gaussian channel 

Y{t) = x{t) + Z{t); t G [0, T], T oo, 

where x{t) is the classical signal subject to the energy constraint x{t)'^dt < 
ET, and Z{t) is the equilibrium quantum Gaussian noise having the commu- 
tator _ 

POO 

[Z{t), Z{s)] = ih uj smLu{t — s)duj = —ih-K6'{t — s). 



zero mean, and the correlation function {Z(t)Z{s)) = B]^{t — s) + K{t — s), 
with 

T-, / \ ^ f°° cos tut , ^ 1 

^ ' Jo eP^^ - 1 {k/3h + itf 

N and (3 are related by = Bn{'S) = /^7rV6/3^ and 

K{t) = - / ue'^^'du = --[t-^ + in6'{t)] 
2 Jo 2 

is the zero temperature correlation {BQ{t) = 0) . However a complete proof 
is still lacking. 
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V. Some further problems 



The present paper was entirely devoted to the "classical-quantum" chan- 
nels, in terminology of \\H.olevo 19771 ), even in this case there are open 
problems, some of which were mentioned above. Such channels can alter- 
natively be described by (completely) positive maps from noncommutative 
algebra of operators in Ti. to commutative algebra of functions on the in- 
put alphabet. More general "quantum-quantum" channels are described by 
completely positive maps between noncommutative algebras. The definition 
of capacity and the quantum entropy bound can be generalized to this case 
tiolevo iy77{ \, \ Ohya and Fetz iy9t \. However for such channels the new 
difficult problem of optimization with respect to coding maps arises. In par- 
ticular, it is not yet known, whether the entropy bound optimized in this way 
is an additive function on the product channel. An interesting preliminary 
investigation of this situation is contained in the paper by Bennett, Fuchs 
and Smolin WBennett et al. 



1996 



All these problems address transmission of classical information through 
quantum channels. There is yet "more quantum" domain of problems con- 
cerning reliable transmission of entire quantum states under a given fidelity 
criterion. The very definition of the relevant "quantum information" is far 
from obvious. Important steps in this direction were made by Barnum, 
Nielsen and Schumacher jBarnuni et al. 1997\\ , who in particular suggested 
a tentative converse of the relevant coding theorem. However the proof of 
the corresponding direct theorem remains an open question. 
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